47 research outputs found

    Model Extraction Warning in MLaaS Paradigm

    Full text link
    Cloud vendors are increasingly offering machine learning services as part of their platform and services portfolios. These services enable the deployment of machine learning models on the cloud that are offered on a pay-per-query basis to application developers and end users. However recent work has shown that the hosted models are susceptible to extraction attacks. Adversaries may launch queries to steal the model and compromise future query payments or privacy of the training data. In this work, we present a cloud-based extraction monitor that can quantify the extraction status of models by observing the query and response streams of both individual and colluding adversarial users. We present a novel technique that uses information gain to measure the model learning rate by users with increasing number of queries. Additionally, we present an alternate technique that maintains intelligent query summaries to measure the learning rate relative to the coverage of the input feature space in the presence of collusion. Both these approaches have low computational overhead and can easily be offered as services to model owners to warn them of possible extraction attacks from adversaries. We present performance results for these approaches for decision tree models deployed on BigML MLaaS platform, using open source datasets and different adversarial attack strategies

    Preprocessors Matter! Realistic Decision-Based Attacks on Machine Learning Systems

    Full text link
    Decision-based adversarial attacks construct inputs that fool a machine-learning model into making targeted mispredictions by making only hard-label queries. For the most part, these attacks have been applied directly to isolated neural network models. However, in practice, machine learning models are just a component of a much larger system. By adding just a single preprocessor in front of a classifier, we find that state-of-the-art query-based attacks are as much as seven times less effective at attacking a prediction pipeline than attacking the machine learning model alone. Hence, attacks that are unaware of this invariance inevitably waste a large number of queries to re-discover or overcome it. We, therefore, develop techniques to first reverse-engineer the preprocessor and then use this extracted information to attack the end-to-end system. Our extraction method requires only a few hundred queries to learn the preprocessors used by most publicly available model pipelines, and our preprocessor-aware attacks recover the same efficacy as just attacking the model alone. The code can be found at https://github.com/google-research/preprocessor-aware-black-box-attack.Comment: Code can be found at https://github.com/google-research/preprocessor-aware-black-box-attac

    Considerations for Differentially Private Learning with Large-Scale Public Pretraining

    Full text link
    The performance of differentially private machine learning can be boosted significantly by leveraging the transfer learning capabilities of non-private models pretrained on large public datasets. We critically review this approach. We primarily question whether the use of large Web-scraped datasets should be viewed as differential-privacy-preserving. We caution that publicizing these models pretrained on Web data as "private" could lead to harm and erode the public's trust in differential privacy as a meaningful definition of privacy. Beyond the privacy considerations of using public data, we further question the utility of this paradigm. We scrutinize whether existing machine learning benchmarks are appropriate for measuring the ability of pretrained models to generalize to sensitive domains, which may be poorly represented in public Web data. Finally, we notice that pretraining has been especially impactful for the largest available models -- models sufficiently large to prohibit end users running them on their own devices. Thus, deploying such models today could be a net loss for privacy, as it would require (private) data to be outsourced to a more compute-powerful third party. We conclude by discussing potential paths forward for the field of private learning, as public pretraining becomes more popular and powerful

    Evaluating Superhuman Models with Consistency Checks

    Full text link
    If machine learning models were to achieve superhuman abilities at various reasoning or decision-making tasks, how would we go about evaluating such models, given that humans would necessarily be poor proxies for ground truth? In this paper, we propose a framework for evaluating superhuman models via consistency checks. Our premise is that while the correctness of superhuman decisions may be impossible to evaluate, we can still surface mistakes if the model's decisions fail to satisfy certain logical, human-interpretable rules. We instantiate our framework on three tasks where correctness of decisions is hard to evaluate due to either superhuman model abilities, or to otherwise missing ground truth: evaluating chess positions, forecasting future events, and making legal judgments. We show that regardless of a model's (possibly superhuman) performance on these tasks, we can discover logical inconsistencies in decision making. For example: a chess engine assigning opposing valuations to semantically identical boards; GPT-4 forecasting that sports records will evolve non-monotonically over time; or an AI judge assigning bail to a defendant only after we add a felony to their criminal record.Comment: 31 pages, 15 figures. Under review. Code and data are available at https://github.com/ethz-spylab/superhuman-ai-consistenc
    corecore